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Instruction-level parallelism (ILP) in nonnumerical code is regarded as scarce and hard to 
exploit due to its irregularity. In this article, we introduce a new code-scheduling technique 
for irregular ILP called "selective scheduling" which can be used as a component for 
superscalar and VLIW compilers. Selective scheduling can compute a wide set of 
independent operations across all execution paths based on renaming and forward- 
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The foremost goal of superscalar processor design is to increase performance through the 
exploitation of instruction-level parallelism (ILP). Previous studies have shown that 
speculative execution is required for high instruction per cycle (IPC) rates in non-numerical 
applications. The general trend has been toward supporting speculative execution in 
complicated, dynamically-scheduled processors. Performance, though, is more than just a 
high IPC rate; it also depends upon instruction count ... 
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The performance of Very Long Instruction Word (VLIW) microprocessors depends on the 
close cooperation between the compiler and the architecture. This paper evaluates a set of 
important compilation techniques and related architectural features for VLIW machines. The 
evaluation is performed on a SPARC-based VLIW testbed where gcc-generated optimized 
SPARC code is scheduled into high-performance VLIW code. As a base scheduling compiler, 
we experiment with three core scheduling techniques including ... 
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on Programming language design and implementation, volume 38 issue 5 
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Method inllning and data flow analysis are two major optimization components for effective 
program transformations, however they often suffer from the existence of rarely or never 
executed code contained in the target method. One major problem lies in the assumption 
that the compilation unit is partitioned at method boundaries. This paper describes the 
design and implementation of a region-based compilation technique in our dynamic 
compilation system, in which the compiled regions are selected a ... 

Keywords: dynamic compilers, on-stack replacement, partial inlining, region-based 
compilation 
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This paper describes the system overview of our Java Just-In-Time (JIT) compiler, which is 
the basis for the latest production version of IBM Java JIT compiler that supports a diversity 
of processor architectures including both 32-bit and 64-bit modes, CISC, RISC, and VLIW 
architectures. In particular, we focus on the design and evaluation of the cross- platform 
optimizations that are common across different architectures. We studied the effectiveness 
of each optimization by selectively disabling ... 

Keywords: Java, just-in-time compiler, optimization 
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Post-pass binary adaptation for software-based speculative precomputation 

Steve S.W. Liao, Perry H. Wang, Hong Wang, Gerolf Hoflehner, Daniel Lavery, John P. Shen 

May 2002 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 2002 Conference 

on Programming language design and implementation, volume 37 issue s 
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Recently, a number of thread-based prefetching techniques have been proposed. These 
techniques aim at improving the latency of single-threaded applications by leveraging 
multithreading resources to perform memory prefetching via speculative prefetch threads. 
Software- based speculative precomputation (SSP) is one such technique, proposed for 
multithreaded Itanium models. SSP does not require expensive hardware support-instead it 
relies on the compiler to adapt binaries to perform prefetching on o ... 

Keywords: chaining speculative precomputation, delay minimization, dependence 
reduction, long-range thread-based prefetching, loop rotation, pointer, post-pass, 
prediction, scheduling, slack, slicing, speculation, triggering 
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Dongkeun Kim, Donald Yeung 

August 2004 ACM Transactions on Computer Systems (TOCS), volume 22 issue 3 

Full text available: ^ pdfd.55 MB) Additional Information: full citation , abstract , references , index terms 

Pre-execution is a promising latency tolerance technique that uses one or more helper 
threads running in spare hardware contexts ahead of the main computation to trigger long- 
latency memory operations early, hence absorbing their latency on behalf of the main 
computation. This article investigates several source-to-source C compilers for extracting 
pre-execution thread code automatically, thus relieving the programmer or hardware from 
this onerous task. We present an aggressive profile-driven co ... 
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January 2002 ACM SIGPLAN Notices , Proceedings of the 2002 ACM SIGPLAN workshop 
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37 Issue 3 

Full text available- 1 B pdf(365.48 KB) Additional Information: full citation , attract, references , dfings, index 
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Abramov and Gluck have recently introduced a technique called URA for inverting first order 
functional programs. Given some desired output value, URA computes a potentially infinite 
sequence of substitutions/restrictions corresponding to the relevant input values. In some 
cases this process does not terminate. In the present paper, we propose a new program 
analysis for inverting programs. The technique works by computing a finite grammar 
describing the set of all input that relate to a given ... 

Keywords: inference, program inversion, supercompilation 
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November 1998 Proceedings of the 31st annual ACM/IEEE international symposium on 
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Full text available: J | pdf(196.77 KB) Additional Information: full citation , abstract , references , citings , index 
I P Publisher Site terms 

By optimizing data layout at run-time, we can potentially enhance the performance of 
caches by actively creating spatial locality, facilitating prefetching, and avoiding cache 
conflicts and false sharing. Unfortunately, it is extremely difficult to guarantee that such 
optimizations are safe in practice on today's machines, since accurately updating all pointers 
to an object requires perfect alias information, which is well beyond the scope of the 
compiler for languages such as C. T ... 
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The existence of statically detectable correlation among conditional branches enables their 
elimination, an optimization that has a number of benefits. This paper presents techniques 
to determine whether an interprocedural execution path leading to a conditional branch 
exists along which the branch outcome is known at compile time, and then to eliminate the 
branch along this path through code restructuring. The technique consists of a demand 
driven interprocedural analysis that determines whethe ... 
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Full text available: , , 
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Whole program paths (WPP) are a new approach to capturing and representing a program's 
dynamic— -actually executed— control flow. Unlike other path profiling techniques, which 
record intraprocedural or acyclic paths, WPPs produce a single, compact description of a 
program's entire control flow, including loop iteration and interprocedural paths.This paper 
explains how to collect and represent WPPs. It also shows how to use WPPs to find hot 
subpaths, which are the heavily executed ... 

Keywords: data compression, dynamic program measurement, path profiling, program 
control flow, program tracing 
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Instruction scheduling is a compiler optimization that can improve program speed, 
sometimes by 10% or more, but it can also be expensive. Furthermore, time spent 
optimizing is more important in a Java just-in-time (JIT) compiler than in a traditional one 
because a JIT compiles code at run time, adding to the running time of the program. We 
found that, on any given block of code, instruction scheduling often does not produce 
significant benefit and sometimes degrades speed. Thus, we hoped that we ... 

Keywords: Java, Jikes RVM, compiler optimization, instruction scheduling, machine 
learning, supervised learning 
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Traditional list schedulers order instructions based on an optimistic estimate of the load 
latency imposed by the hardware and therefore cannot respond to variations in memory 
latency caused by cache hits and misses on non-blocking architectures. In contrast, 
balanced scheduling schedules instructions based on an estimate of the amount of 
instruction-level parallelism in the program. By scheduling independent instructions behind 
loads based on what the program can provide, rather than what ... 
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Microcode for mass produced architectures is still predominantly generated by hand. Yet, as 
speed dictates the vertical migration of commonly executed functions to microcode, the 
demand for automated code generation increases. Though considerably more complex than 
phase-decoupled methods, phase-coupled methods for the generation of horizontal 
microcode have the potential to produce more highly optimized microcode. Results of the 
retargetable phase-coupled microcode compiler, Hori ... 
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Many of today's high-level parallel languages support dynamic, fine-grained parallelism. 
These languages allow the user to expose all the parallelism in the program, which is 
typically of a much higher degree than the number of processors. Hence an efficient 
scheduling algorithm is required to assign computations to processors at runtime. Besides 
having low overheads and good load balancing, it is important for the scheduling algorithm 
to minimize the space usage of the parallel program. T ... 
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